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ABSTRACT 

The characterisation of ever smaller and fainter extrasolar planets requires an intricate un- 
derstanding of one's data and the analysis techniques used. Correcting the raw data at the 10""* 
level of accuracy in flux is one of the central challenges. This can be difficult for instruments that 
do not feature a calibration plan for such high precision measurements. Here, it is not always 
obvious how to de-correlate the data using auxiliary information of the instrument and it becomes 
paramount to know how well one can disentangle instrument systematics from one's data, given 
nothing but the data itself. We propose a non-parametric machine learning algorithm, based 
on the concept of independent component analysis, to de-convolve the systematic noise and all 
non-Gaussian signals from the desired astrophysical signal. Such a 'blind' signal de-mixing is 
commonly known as the 'Cocktail Party problem' in signal-processing. Given multiple simulta- 
neous observations of the same exoplanetary eclipse, as in the case of spectrophotometry, we show 
that we can often disentangle systematic noise from the original light curve signal without the 
use of any complementary information of the instrument. In this paper, we explore these signal 
extraction techniques using simulated data and two data sets observed with the Hubble-NICMOS 
instrument. Another important application is the de-correlation of the exoplanetary signal from 
time-correlated stellar variability. Using data obtained by the Kepler mission we show that the 
desired signal can be de-convolved from the stellar noise using a single time series spanning several 
eclipse events. Such non-parametric techniques can provide important confirmations of the exis- 
tent parametric corrections reported in the literature, and their associated results. Additionally 
they can substantially improve the precision exoplanetary light curve analysis in the future. 

Subject headings: methods: data analysis — methods; statistical — methods: data analysis — tech- 
niques: photometric — techniques: spectroscopic 



1. Introduction 

The field of transiting extrasolar planets and 
especially the study of their atmospheres is 
one of the youngest and most dynamic sub- 
jects in current astrophysics. Permanently at 
the edge of technical feasibility, we have come 
from the first radial velocity and transit detec- 
tions (iMavor fc QuelozlllQQSl: iMarcv et al. Ill998t 



Charbonneau et al. I 120001 ). via the first detec- 
tions of molecular features in hot- Jupiter atmo- 



spheres (ICharbonneau et al. 1120021 ) to ever more 



detailed characterisations of multitudes of systems 
( Grillmair et al. II2008I: ICharbonneau et al. Il2008t 



Snellen et al. I l2010bl: iBeanI l201lt ISwain etaL 



20081 l2009aUbl: iTinetti et al. 1120071 l2010h. With 



over 700 exoplanets discovered ([Schneider et al 



20111 ) and over 1200 exoplanetary candid ates that 



await confirmation ( Borucki et al. 1 120111 ). the fo 



cus of interest shifts from the detection to the 
characterisation of smaller and smaller targets. 
The governing factor of this progression is the 
precision at which we can control our instrument 
and/or stellar systematics, hence the accuracy 
with which we can analyse the data. 

To minimise the impact of the systematic 
noise components, different approaches have 
been proposed in the past. For space and 
ground-based observations, eg. Spitzer and 
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Hubble (eg. 


AeoI et al. 2010l: Beaulieu et al. 


2008. 2011; Charbonneau et al. 1 2002". '2005'. '20081 


Demine et al. 


2007: Gillon et al. 12010: Grillmair et al. 


20081: iKnutson et al. Il2007allb: ISine et al. 112011; 


Snellen et al. 


2010b; Bean et al. 12011; Swain et al. 


2008, 2009a^ 


: Tinetti et al. 2007, 2010l). in- 



strumental systematic noise has been approx- 
imated using parametric models, often based 
on auxiliary information (optical state vectors) 
such as instrumental temperature, orbital inclina- 
tion, inter and intra-pixel positions of the point- 
spread-function. Using optical state vectors to 
de-correlate one's d ata is an effective technique 
( Swain et al. ] l2008h . However for instruments 



that lack a calibration plan at the precision of 
10""*, the accuracy of the retrieved optical state 
vectors (e.g. sensor sampling rates) and the ad- 
equacy of the instrument model's definition it- 
self become difficult to determine. Some of the 
recent controversy over results reported by var- 
ious tea ms can be att r ibute d to this circum - 
stanc e (iKnutson et al. _ 2011 Stevenson et al 



2OIOI: iBeaulieu et al. I I2OIII: ISwain et al. I l2008t 



Gibson et al. ll201lHPont et al. l2010HHatzes et al 



2OIOI : iBrunt et al. 1120101 ) 

The situation is even further complicated 
by brightness variability of the planet hosting 
star, in particular for small, very active M-stars. 
Hence, it is important to work towards an al- 
ternative route to quantify or remove system- 
atic noise using non-parametric models that work 
as independent co nfirm ati ons of existing results. 



Carter fc wimTI tOQ^ . iThatte et al. I |2010l) 



Gibson et al. I ( 2011 ) and Waldmann et al. 1 (|20lll) 

have progressed towards non-parametric noise 
models and signal separation using wavelets, prin- 
cipal component, Gaussian processes and Fourier 
based analysis. 

In this publication, we propose a new non- 
parametric method to separate systematic noise 
from the desired lightcurve signal. Given mul- 
tiple lightcurves, observed simultaneously us- 
ing spectrographs for example, we can disentan- 
gle our desired astrophysical signal from other 
time-correlated or non-Gaussian systematic noise 
sources using un-supervised machine learning 
algorithms. We furthermore explore the de- 
correlation of individual time series spanning sev- 
eral consecutive eclipse events. The importance 
of this work lies with the fact that no additional 



knowledge of the system or the star is required, 
besides the observations themselves. Such non- 
parametric methods provide a potential new route 
to de-correlate one's date in the case where the 
instrument does not feature an adequate calibra- 
tion plan, the quality of the auxiliary informa- 
tion of the instrument is non-optimal or the host 
star shows significant activity. Such blind de- 
convolution techniques provide new insight and 
powerful validation of the established parametric 
instrumental models reported in the literature. 

Here we will briefly introduce the more gen- 
eral theory of blind-source separation and proceed 
with a description of the algorithm proposed. The 
efficiency of said algorithm is tested with two syn- 
thetic models and two HST/NICMOS data sets 
available in the public domain and one Kepler 
(Q2 & Q3 data release) time series featuring strong 
time-correlated stellar variability. Future publica- 
tions will focus on in-depth discussion of the pro- 
posed algorithm to specific data sets. 

2. Background: the Cocktail Party Prob- 
lem 

In this section we will briefly describe the fun- 
damental concepts on which the following analysis 
is based. Readers familiar with independent com- 
ponent analysis may proceed straight to section [3l 

Let us consider the analogy of three people talk- 
ing simultaneously in one room. The speech sig- 
nals of these people are denoted by si{t), S2{t) 
and S3{t). In the same room are three micro- 
phones recording the observed signals xi{t), 2:2 (i) 
and X3{t). The observed signals can be expressed 
in terms of the original speech signals: 



xi{t) = aiisi(t) + ai2S2{t) + ai3S3{t) (1) 

X2{t) = a2lSi{t) + a22S2{t) + a23S3{t) 



instead of assuming x{t) and s{t) to be proper time 
signals, we drop the time dependence and assume 
them to be random variables 



Xk 



akiSi+ak2S2+---+akNSN, 



for allk^ l,...,iV 
(2) 
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where aki is a weighting factor (in this case the 
square of the distance of the speakers to the mi- 
crophone) and k,l = 1,...,N are some real coeffi- 
cients with N being the maximum number of ob- 
served signals. The individual time series can also 
be expressed in terms of vectors where bold lower- 
case letters denote column vectors and upper-case 
letter denote matrices: 



= As 



(3) 



where the rows of x comprise the individual time 
series, x^, and similarly s is the signal matrix of 
the individual source signals sj. A is the 'mix- 
ing matrix' consisting of the weights a/^. Equa- 
tion [3] is also known as the instantaneous mix- 
ing model and often referre d to as the classi- 



cal 'Cocktail Party Pro blem' (jHwarinen fc Oia. 
2OOII : iHvvarinen Hl999l ). 



The challenge is to estimate the mixing matrix, 
A and its (pseudo)inverse the de-mixing matrix, 
W, 



W = A 



(4) 



given the observations contained in x without any 
additional prior knowledge of either A or s, or for 
some methods without restrictions of A & s. 

Several algorithms have been proposed to 
find the linear transformation of equation [3] 
Amongst t hese are principal component analy- 
sis (PCA) |Pearsonlll90lUManlv l ll993 IJolhffe I 



sis 



2OO2I : IPress et al Il20q7tl0ialll992h. f actor analy 



fFA'l (IJolliffe ll200^iHarman Ill967l). projecti 



pursuit rPP) (jFriedman II1987I: iHuber H 19851) and 



nent 


analvsis QCA) dComon 1994t Hvvarinen 


1999 


Hvvarinen et al. 1 


999l Hvvarinen & Oia 


2000 


Hvvarinen & Oia. 


2001:IComon & Jutten 


2000 


Stone l2004h. 



The underlying differences between PGA and 
FA on one hand and ICA and PP on the other are 
the underling assumptions on the probability dis- 
tribution functions (pdfs) of the signals comprising 
X. The former group assumes the signals to fol- 
low: 1) a Gaussian distribution whilst the latter 
assume the signals to be, 2) predominantly non- 
Gaussian or sparse with speci fic signatures in the 



spect ral domain (e.g. SMICA iDelabrouille et aC 



way we estimate our signal components. 

1) If the observed signals comprising x fol- 
low Gaussian distributions, we can define their 
statistics by the first and second statistical mo- 
ments (mean and covariance) only. Algorithms 
such as PGA and FA find a linear transformation 
from the correlated observed signals, x, to a set 
of uncorrelated signals, s. In this case, uncor- 
relatedness is equivalent to mutual independence 
and the source signals are at their most separated 
(please see Appendix |X] for a more in-depth dis- 
cussion on independence). Such a linear trans- 
formation is always possible and easily achieved 
using, for example, single va,lue d ecompositions 
fSVPI (|Pearsonlll90ll :lMMd7l[l994: JoUiffe 200i 



Press et al. 1 1200711 ^An application of PGA to 



exoplanetary l ight c urve de-trending is given by 



Thatte et all (|2010l ) 



2) In the case of the observed signals follow- 
ing non-Gaussian distributions, significant infor- 
mation is contained in the higher statistical mo- 
ments (skew & kurtosis) and it can be shown 
that uncorrelated signals (as produced by PGA & 
FA) are not necessarily mutually independent and 
hence not optimally separated from one another. 
Here uncorrelatedness is a weaker constraint than 
independence and it can be said that independent 
signals are always uncorrelated but not vice versa 
(see appendix As a consequence using PGA or 
FA algorithms may only yield a partially separated 
result for non-Gaussian sources. 

It is important to note that most observed sig- 
nals, astrophysical or stellar /instrumental noise, 
are predominantly non-Gaussian by nature. We 
can also state that most of these signals should be 
statistically independent from one another (e.g. 
an exoplanet light curve signal is independent 
of the systematic noise of the instrument with 
which it was recorded). These properties have 
led to a surge in IGA based analysis methods 
in current cosmology and extra-galactic astron- 
omy. Here IGA is used to separate the cosmic 
microwave background (GMB) or signatures of 
dist ant galaxies from their galactic foregrounds 
fe.g.lStivoh et al. II2OO6I: iMaino et al.1l2002ll2007t 



20031 ) . This results in significant differences in the 



Wang et al. I l2010h ". lAumont fc Macias-Perez 
(|2007l ) furthermore separates instrumental noise 
from the desired astrophysical signal. Other appli- 
cations include data-compression of sparse, large 
data sets to improve model fitting efficiencies (e.g. 
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Lu et al. |[200l iDelabrouille et al. II2003I ) 



2.1. ICA in the context of exoplanetary 
lightcurves 

In this publication, we focus on the applica- 
tion of ICA to exoplanetary lightcurve analysis. 
Let us consider multiple time series observations 
of the same exoplanetary eclipse signal either in 
parallel, by performing spectrophotometry with a 
spectrograph, or consecutive in time (as explained 
in section lO)) . 

Without excluding the most general case, let 
us focus on a time-resolved spectroscopic measure- 
ment of an exoplanetary eclipse. For most obser- 
vations, the signal recorded is a mixture of astro- 
physical signal, Gaussian (white) noise and sys- 
tematic noise components originating from instru- 
mental defects and other sources such as stellar 
activity and telluric fluctuations. We can there- 
fore write the individual time series as sum of the 
desired astrophysical signal, Sa, systematic (non- 
Gaussian) noise components, Ss„, and Gaussian 
noise, s^,„. We can now define the underlying lin- 
ear model of our time series data to be 



x{t) =aiSa{t) + (t) + a3Ssn2{t) + ... + S«,„(t) 

(5) 



or 



Xk 



(6) 



where N^n and N^n are the number of system- 
atic and white noise components respectively and 
= Nsn + Nwn + 1 assuming only one com- 
ponent is astrophysical. 

2.2. Demixing signals using ICA 

The basic assumptions of ICA are that the ele- 
ments comprising s, s/, are mutually independent 
random variables with probability densities, pi{si). 
We further assume that all (or at least one) of 
the probability densities, pi{-), are non-Gaussian. 
This non-Gaussianity is key since it allows the 
de-mixing matrix, W, to be estimated. From 
the central limit theorem, we know that a con- 
volution of any arbitrary probability distribution 
functions (pdfs) that feature a formal mean and 



variance, asymptotically approaches a Gaussian 



distribution in the limit of large N (jRilev et al 

In other words, the sum of any two non- 
Gaussian pdfs (ie. pi{-) and p;+i(-)) is more Gaus- 
sian than the respective original pdfs. There- 
fore by maximising the non-Gaussianity of the 
individual signa ls, we maximi s e their sta tistical 

19941 i Hvvarinen 



independenc e. (IComon . 
Hyviirincn fc Oia l l200d: 



Hvvarinen fc Qia 



Stone 



20041) . 



2001 



Koldovskv et al. 



Gomon fc Jutten 



1999; 



2006; 



200C; 



2.2.1. Information Entropy 

Although several measures of non-Gaussianity 



exist ( we refer the reader to [ Cichocki fc Amari . 
(|2002[) . IHvvarinen fc Oial (l200ql). IHvvarinen fc Qia 



20011 ) and IComon fc Jutten I (|2000[) for detailed 



summa ries), we here u se the concept of 'negen- 



tropy' (|Brillouin 1 119531) . Negentropy is derived 



from the basic information-theoretical concept of 
differential entropy. In information theory, in close 
analogy to thermodynamics, the entropy of a sys- 
tem is at its maximum when all data points are at 
its most random. In thermodynamics we measure 
the distribution of particles, in information theory 
it is the probability distribution of a random vari- 
able. From information theory we can derive the 
fundamental result that a Gaussian distribution 
has the largest entropy among all random vari- 
ables of equal variance and known mean. Hence, 
by minimising the entropy of a variable, we max- 
imise its non-Gaussianity. For a random vector y, 
with random variables j/^, i — 1, ...,ri, the entropy 
is given by 



H(y) = - j p(y)log2P(y)dy 



(7) 



where H(y) is th e differential or Shannon entropy 
( Shannonll 19481 ) and p(y) is the pdf of the random 
vector y. H(y) is at its minimum when p{y) is 
at its most non-Gaussian. We can now normalise 
equation [7] to yield the definition of negentropy: 



J(y) = }i{y gauss) - H(y) 



(8) 



where y gauss is a random Gaussian vector with the 
same covariance matrix as y. Now y is at its most 
non-Gaussian when J(y) is at its maximum. It is 
important to note that negentropy is insensitive to 
a multiplication by a scalar constant. This has the 
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important consequence of introducing a sign and 
scaling ambiguity into the retrieved signal compo- 
nents of s. We will discussed the implications of 
this limitation in section H) 

2.2.2. Contrast functions 

In practice it is very difficult to calculate the ne- 
gentropy of a system and various methods were de- 
vised to approximate J(y). The classic method is 
to measure the kurtosis of mean-subtracted y with 
unit variance. However, kurtosis is very prone to 
distortions by outliers in the data and hence lacks 
the robustness required a s measure of negentropy 



(jHyvarinen fc Oia. II2001I ). To overcome this lim 



itation, more robust measures of negentropy have 
been devise d. One can approxiniate negentropy by 
equation [9l ( Hvvarinen fc Oia. 200lJ : Hvvarinen 



19991 : IComon fc Jutten Il200d: IStone fl2004l ) 



J(y) (X (E[G(y)] - E[G(i/)] 



(9) 



where is a random Gaussian variable with zero 
mean and unit variance and G is a non-quadratic 
contrast function chosen to approximate the un- 
derlying probability distribution. There are usu- 
ally three types of contrast functions: Gi as gen- 
eral purpose function, G2 optimised for super- 
Gaussian (leptokurtic) distributions and G3 op- 



tions (Hvvarinen 


1999; 


Comon & Jutten 


I2OOOI) 



2001 



Gi{y) = —log[cosk{aiy)] 
ai 

-exp(-yV2) 



(10) 



G2iy) 
G3{y) 



1 



The choice of contrast function is only impor- 
tant if one wants to optimise the performance of 
the ICA algorithm as i t is done for the EFICA 



(jKoldovsky et al. 1 120061 ) algorithm where choices 



of contrast functions are tried iteratively. 

2.2.3. FastICA 

Finally, we can maximise the negentropy given 
in equation [9] by finding a unit vector w that 
maximises the non-Gaussianity of the projec- 
tion Hi = w'^x, so that J(w'^x) is at its maxi- 
mum. For the fixed-point FastICA algorithm, this 



can be achieved by the simple iterations scheme 
(|Hvvarinen Ill999l : iHvvarinen fc Oja Il2000[l : 



1. Choose initial (i.e. random) weight vector w 

2. Increment: w+ = Fj[xg{'w'^x.)]—E[g (w^x)]w 

3. Normalise: w = w+/||w+|| 

4. Repeat steps 2 & 3 until converged 

where g and g are the first and second deriva- 
tives of the contrast function G(-). The itera- 
tion scheme above estimates only one weight vec- 
tor at a time, for estimating all non-Gaussian 
components in parallel the iteration scheme and 
derivates of G(-) are given in appendix IC.ll and 



Hvvarinen fc Oia 



20011 

Koldovskv et al. I 120061) . For a comprehensive 
summary of other ICA a lgorithms please refer to 
(|Comon fc Jutten Il2000l ). Throughout this paper, 
we use a variant of the FastICA algorithm. 

2.2.4. Projection Pursuit and ICA 

Projection Pursuit and Independent Compo- 
nent Analysis are inherently linked as both al- 
gorithms try to represent the most non-Gaussian 
components in an multidimensional data set. In 
this sense, one can think of ICA as a variant of 
PP with two major differences: 1) PP only es- 
timates one non-Gaussian component at a time 
whilst ICA is the multivariate definition of PP and 
estimates all non- Gaussian components simultane- 
ously, 2) as opposed to ICA, PP does not need an 
underlying data model and no assumptions about 
independent components are made. If the ICA 
model holds, optimizing the ICA non-Gaussianity 
measures produce independent components; if the 
model does not hold, then w hat we get are the 



app 

in the standard literature (e.g. IHvvarinen 19£ 



Comon fc Jutten 1200 



projection pursuit directions (jHvvarinen fc Oia 



2OOOI: IStonell2004l) . This is an important point 
to make with regards to time-consecutive transit 
observations, which break the underlying ICA as- 
sumptions otherwise. 

3. The algorithm 

Following from the discussion above, we can un- 
derstand the signal de-mixing to be a two step 
process: de-correlation of the Gaussian compo- 
nents in the observed data using PCA, followed 
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Fig. 1. — Flowchart illustrating the algorithm. 
The input data is first transformed into an orthog- 
onal set using PCA. The latent signals compris- 
ing the input data are then separated using the 
MULTI-COMBI algorithm which is followed by a 
signal sorting step. The separated lightcurve and 
systematic noise components are then fitted to the 
original data. 



by the de-correlation of non-Gaussian components 
using ICA and WASOBI algorithms. The de- 
correlation of Gaussian components to form a new 
uncorrelated vectors set can be understood as pre- 
processing step to the ICA procedure. The algo- 
rithm proposed here consists of five main parts: 
1) Pre-processing of the observed data, 2) Signal 
separation, 3) Signal reconstruction 4) Lightcurve 
fitting and 4) Post-analysis. Figure [T] lays out the 
individual processing steps of the algorithm. 

3.1. Signal pre-processing 

Similarly to section [21 the observed signals can 
be expressed as fc x m dimensional matrix X where 
each row constitutes an individual time series, Xk, 
with m number of data points. Multiple time se- 
ries observations are needed to separate the instan- 
taneously mixed non-Gaussian components. The 
process of identifying statistical independent com- 
ponents is greatly simplified if the input signals to 
any ICA algorithm have previously been whitened 
(also referred to as sphering). Whitening is essen- 
tially a transformation of our input data matrix 
X into a mean subtracted, (X — X), orthogonal 
matrix X, where its auto-covariance matrix, Cx, 
equals the identity matrix, Cx = E[XX'^] — I. 
The instantaneous mixing model for the whitened 
data is now given by 

X = C^^/^(X-X) = AS (11) 

— 1/2 

where Cx is the inverse square root of Cx and 
A the corresponding mixing matrix of X. For a 
more detailed explanation see Appendix. 

This whitening is easily achieved by performing 
a principal component analysis on the data (see 
Appendix). This step has two distinct advantages: 

1) It reduces the complexity of the un- whitened 
mixing matrix. A, from parameters, to n(n — 



l)/2 for a whitened, orthogonal matrix A (jHvvarinen fc Oia. 
20011 ). 2) Using whitening by principal compo- 
nents, we can reduce the dimensionality of the 
data-set by only maintaining a sub-set of eigen- 
vectors. This reduces possible redundancies of the 
components comprising the data and prevents the 
later to be employed ICA algorithm from over- 
learning for over-complete sets. 

We also like to note that any type of addi- 
tional linear signal cleani ng or pre-processing ste 



such as those described bv lCarter fc Winn I (|200 
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Waldmann et al. I (|201l[ ). are allowed. Linear data 
filtering or cleaning can be understood as multi- 
plying equation |3] from the left with a linear trans- 
formation B to get: BX = BAS. The underlying 
data model assumed in this paper is hence not af- 
fected. 

3.2. Signal separation 

After the observed signals have successfully 
been whitened (X), we estimate the mixing ma- 
trix of the whitened si gnal, A, using the MULTI - 



COMBI algorithm (|Tichavskv et al. I l2006a[ ). 
MULTI-COMBI co mprises two complimenta ry al- 
gorithms, EFICA dKoldovskv et"al~l |2006[) and 



WASOBI (lYeredor I I2OOOI) . EFICA an asymp- 



toticall y efficient variant of the FastICA algo- 
rithm (jHwarinen I1999I) . is designed to sepa- 
rate non-Gaussian, instantaneously mixed sig- 
nals. WASOBI, on the other hand, is an asymp- 
toticaUy efficient version of the SOBI algorithm 



(jBelouchrani et al. II1997I ). and is geared towards 



separating Gaussian auto-regressive (AR) and 
time-correlated components. It uses second-order 
statistics and can be understood to be similar to 
principal component analysis. The use of both al- 
gorithms is necessary since a real life data set will 
always contain a mixture of both, non-Gaussian 
and Gaussian AR processes. For a more in-depth 
discussion of the algorithms employed here, we 
like to refer the interested reader to the appen- 
dices [C]T] & [C]2] and the original publications. 

The EFICA and WASOBI algorithm can be 
shown to be asymptotically efficient, i.e. the es- 
timators appro ach the Cramer-Rao lower bound 
(|bavison][200i). In other words, the algorithms 
employed here can be shown to converge to the 
correct solution given the original source signals 
and in the limit of ^ 00 iterations. In reality 
the number of iterations is finite and and imper- 
fect convergence results in traces of other sources 
to remain in the individual signals comprising S. 
We can hence state that equation |4] becomes 



W 



(12) 



A measure of this error is the deviation of WA 
(or WA for the whitened case) from the unity 
matrix by inspecting the variance of its element s 



(|Koldovskv et al. ll2006l : [Hvvarinen fc Oja. Il200l[ ) 

This leads to the concept of the interference-over 



signal ratio (ISR) matrix. The ISR is the standard 
measure in signal processing of how well a given 
signal has been transmitted or de-convolved from 
a mixture of signals. It can be understood as 
the inverse of the signal-to-noise ratio (SNR). The 
higher the ISR for a specific signal, the less well 
has it been separated from the original mixture. 
For a real case example, A is unknown and the 
ISR needs to be estimated. An analytic approxi- 
mation to the ISRs for the EFICA and WASOBI 
algorithms are found in the appendices I C . II fc IC . 2 1 
Finally, we check the stability of the signal sep- 
aration by perturbing the input matrix X by a 
random and known matrix P to give 



X2 = PX = PAS 



(13) 



We now re-run the MULTI-COMBI procedure 
using X2 as input and estimate PA. Knowing 
P we can work backwards to obtain A as we are 
dealing with a linear transformation. This step is 
repeated several times to check the convergence 
of the algorithm by inspecting the variation on 
the mean ISR values of each separation attempt 
and the variations in consecutive estimations of A 
directly. 

3.3. Signal reconstruction 

Once the mixing matrix, A is estimated, we 
need to identify which signals are astrophysical, 
which ones are white and which are systematic 
noise. This is done in a two step process: 

1) We construct the estimated signal matrix, 
S, and for its individual components §1 compute 
the Pearson correlation coefficient between §1 and 
the first principal component of the PC A decom- 
position in section 13.11 For medium signal to 
noise (SNR) observations, the first principal com- 
ponent (PC), ie. the one with the highest eigen- 
value associated to it, will contain the predomi- 
nant lightcurve shape. As previously discussed, 
the first PC is not perfectly separated from the 
systematic signals and hence cannot be used di- 
rectly for further analysis but it is good enough to 
use it as lightcurve identification. The identified 
lightcurve signal is labeled Sa. 

2) Once the lightcurve signal is identified, we 
exclude this row from S and proceed to classify 
the remaining signals with respect to their non- 
Gaussianity (ie. systematic noise sources). Here 
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we use the Ljung-Box portmanteau t est (see Ap- 
pendix and iBrockwell fc Davis" 20061 ) to test for 
the hypothesis that the time series is statisticahy 
white (ie. Gaussian). This test was originally 
designed to check the residuals of auto-regressive 
moving-average (ARMA) models for significant 
departures from Gaussianity. It is hence ideally 
suited for our need to identify which estimated 
signal components are the desired non-Gaussian 
ones. 

The identified non-Gaussian, systematic noise, 
signals are hence labeled Sgn and the remaining 
white noise signals S^n to give 



A 



+ Swn = S = WX 



(14) 



where S has dimensions k x / and Sa, Ssn, Swn, 
have dimensions of fc x 1, kxlgn and k x Zu,„ respec- 
tively where I = '^hn,-^'^ ^wn + 1- The de- mixing 
matrix is given by W = A^-"-. 

As previously mentioned, the components of S 
have ambiguities in scaling and sign and can be 
thought to be similar to the eigenvectors of a prin- 
cipal component analysis with missing eigenvalues. 
Fortunately there exist two approaches to resolv- 
ing this degeneracy: 

1. In the case of Sa being well separated as in- 
dividual component, we can take Sa and the 
de-mixing matrix W and only retain the row 
containing the astrophysical signal compo- 
nent forming the row- vector Wa- We then 
reconstruct the original data X using only 
the separated signal component: 



Xa = 



^Sa = 



'WX 



(15) 



where Xa is the reconstructed whitened data 
with all but the astrophysical signal compo- 
nents removed. Using equation 111! we can 
now calculate the un- whitened matrix Xa- 



Xa = Z(X- X) +X 



Z = W7^W 



(16) 
(17) 



Hence we can think of Z as a linear, optimal 
filter for the signal component in X. Please 
note that this linear filtering does not impair 
the scaling information as this is re-instated 
going from Sa to Xa- 



2. In the case of Sa not being well separated 
but other systematic noise components are, 
a different, more indirect approach can be 
used. Here, the systematic noise compo- 
nents, Ssn which do not contain sign or 
scaling information, are simultaneously fit- 
ted to the time series data (preferably out- 
of-transit data), Xk- We therefore define 
the systematic noise model for an individual 
time series by Mgn, 



Msn = OSs 



(18) 



where O is a k x k diagonal scaling matrix 
of Ssn, which needs to be fitted iteratively 
as free parameters in the following section. 

3.4. Lightcurve fitting 

Having either filtered the data to obtain Xa or 
constructed the noise model Msn, we can now fit 
the original time series, Xk using the st andard an- 
alytical lightcurve models (Ma ndel fc Agol 200^ 
Seager fc Mallen-Ornelas Il2003[ ) in addition to the 
diagonal matrix O, if necessary. For the pur- 
pose of this paper, which focuses on blind-source- 
separation, we will restrict ourselves to demon- 
strating the feasibility of estimating O and only 
leave the transit depth as variable lightcurve pa- 
rameter. We use the an alytic lightcurve model by 
Mandel fc Agol I (120021) and a Ne lder-Mead min- 
imisation algorithm ([Press et al. 1 12007 ). For real 
data applications, we advise the reader to use 
Markov Chain Monte Carlo methods, or simi- 
lar, which have become standard in the field of 
exoplanets and allow the estimation of the pos- 
terior probability distribution s and their associ- 
ated errors (Bakos ct al. 2007^ Burke et al. 12007 : 
Cameron et ani2007t iFo7d1l2006t iGregorv Il201ll) . 



3.5. Post-analysis 

Once the model fitting stage has been com- 
pleted, we are left with fitting residual, r^, i.e. 
rk = Xk — iTLk- Several tests are useful to de- 
termine how well the signals have been removed 
from the original time series, Xk- In the case of a 
full Markov Chain Monte Carlo fitting, the poste- 
rior distributions of the fitting parameters may be 
used to asses the impact of the remaining system- 
atic noise in the data when compared to a simu- 
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lated data set only containing white noise. Port- 
manteau tests on individual time series are use- 
ful to test for non-white noise signals and allow 
a measu re of the overall perform ance of the al- 



gorithm (|Brockwell fc Davis II2006I ). Additionally, 



we can determine the Kull back-Leibler divergence 
(jKullback fc Leibler 1119511 ) of our residual's prob- 
ability distribution function (pdf) to an idealised 
Gaussian case. 

For the simulations and real-data examples pre- 
sented in the following section, we have merely 
plotted the autocorrleation functions (ACF) of 
the residuals obtained to determine whether for 
a given lag, these are within the Scr confidence 
limit of the residual being dom i nated by white- 
noise (jBrockwell fc Davis l liooi: iDavison I [2OO9I) . 



Here the ACF is given by: 



^ rn—T 

ACF{k, t) = — (rfe.t - fk){rk,t+T - rt) 



(19) 



r = 0,1,2,3, ...rn/2 



where m is the number of data points in the time 
series, t the specific lag and the confidence inter- 
vals are given by ±cr/Y^. 

4. Simulations 

In order to test the behaviour and efhciency 
of the algorithm described above, we produced a 
toy model simulation wi th five observed s ignals : 



1) a secondary eclipse iMandel fc Agol I (|2002l ) 



lightcurve; 2) a sinusoidal signal; 3) a sawtooth 
function; 4) a fourth order auto-regressive signal 
to simulate time-correlated signals; 5) Gaussian 
noise with a full width half maximum (FWHM) 
of 0.01 magnitudes. The premixed signals are dis- 
played in figure [21 This gives us our signal matrix, 
S, which needs to be recovered later on. We have 
then proceeded to mix the signals in figure [2] us- 
ing a random mixing matrix. A, to obtain our 
'observed signals', X, in figure [3 For the sake of 
comparability we keep the mixing matrix A to be 
the same for all simulations. 

We now subdivide the simulations to illustrate 
the two possible methods of the signal reconstruc- 
tion. Method 1 computes Xa using equation [16] 
whilst Method 2 fits the noise model Mgn (equa- 



tion's]) simultaneously with the iMandel fc Agol 
(j2002n lightcurve. These two examples demon- 
strate that both techniques work equally well for 
a well behaved data set. 

4.1. Method 1: Filtering out the signal 

In this example, we use the 'observed' signals 
in figure [3] as input to the algorithm. We how- 
ever do not perform a dimensionality reduction 
using PC A since we are not dealing with an over- 
complete set in this example. The results of the 
separation are shown in figure [S] Here the top 
three, red lightcurves are the estimated system- 
atic noise components as identified by the algo- 
rithm. The fourth component is Gaussian noise 
and the bottom is an inverse of the lightcurve sig- 
nal. It should again be noted here that the blind- 
source separation does not preserve the scaling nor 
the signs of the signals in S. However, when the 
original data is reconstructed using only the sig- 
nal component, Sa, to obtain Xa (equation [16]) , 
the scaling and sign informations are re-instated. 
For a well behaved data set, i.e. one that obeys 
the instantaneous mixing model and has negligi- 
ble Gaussian noise in their signal components, it 
is therefore possible to re-construct the lightcurve 
signal from the raw data as explained in section 
13.31 Figure [4] shows the top lightcurve of figure 
[3] (blue circles) and overplotted the retrieved sig- 
nal component (red crosses) and offset below the 
systematic noise component (black squares). 

As a useful by-product of the algorithm, we 
obtain the interference over signal matrices (ISR, 
equations IC7I & ICllI in the Appendix [C| for both 
the EFICA and WASOBI algorithms. These give 
us valuable information on the efficiency at which 
the signals have been separated. Figure [6] shows 
the Hinton diagrams of the EFICA and WASOBI 
ISR matrices. Here, the smaller the off-diagonal 
elements of the matrix, the better the signal sep- 
aration. In this example, the EFICA algorithm 
outperforms the WASOBI one, which is to be ex- 
pected since all signals but one are non-Gaussian. 

4.2. Method 2: Fitting a noise model to 
the data 

In the previous section, we have shown that in 
the case that the astrophysical component Sa is 
well separated as individual signal, we can create 
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Fig. 2. — Simulated input signals before mix- 
ing. From top t o bot tom: 1) secondary eclipse 
Mandel fc AgoTI (|2002l) curve, 2) sinusoidal func- 
tion, 3) sawtooth function, 4) time-correlated 
auto-regressive function, 5) Gaussian noise. The 
scaling of the ordinate is identical for all subplots. 
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Fig. 3. — The signals, S, in figure [2] were mixed us- 
ing a random mixing matrix A to obtain the 'ob- 
served signals', X normalised to unity, shown in 
this diagram. The algorithm takes the lightcurves 
in this diagram as starting values. No further in- 
put is provided or assumptions on the underlying 
signals made. The scaling of the ordinate is iden- 
tical for all subplots. 



a filter for the raw data that directly filters the 
lightcurve signal from the noise. However, in most 
real data applications, Sa, is not perfectly sepa- 
rated but the components of Sgn may be more so. 
In this case we can construct the noise model Mgn 
given by equation [18] and the diagonal elements of 
O are fitted as described in section 15^ The start- 
ing position of the algorithm is the same as for 
the previous example (figure The model fit of 
the first lightcurve in figure |3] and its residuals are 
shown in figure [T) The autocorrelation function 
for 100 lags is plotted in figure IS] All but two lags 
are within the Str confidence limit that the resid- 
ual is white noise dominated, indicating that all 
signals have been removed effectively. 

Finally we simulate the convergence properties 
of both EFICA and WASOBI under varying white 
noise conditions. Here we repeatedly run the al- 
gorithm until signal separation is completed and 
record the mean ISRs of the source separation. We 
performed this simulation 300 times for Gaussian 
noise FWHMs varying from 0.0 - 0.3 magnitudes 
(figureElhas a FWRMgauss = 0.01) and every ISR 
measurement reported is the mean of 10 iterations. 
Figure [HI summarises the results. Here, the red 
circles represent the mean ISR or the EFICA al- 
gorithm and the blue crosses that of WASOBI. 
It can clearly be seen that for this example the 
EFICA algorithm outperforms WASOBI and on 
average reaches lower ISR values. We can further 
note that the blind source separation is not sig- 
nificantly affected by the magnitude of the white 
noise and performs well under difficult signal to 
noise conditions. 

5. Application to data 

The previous examples illustrated the two 
methods available to correct the observed data: 
Method 1: Filtering the astrophysical signal out of 
the systematic noise or Method 2: fitting a system- 
atic noise model to the raw data. Here we apply 
these techniques to two primary eclipse data sets 
of HD 189733b and XOlb recorded by the NIC- 
MOS instrument on the Hubble Space Telescope 
as well as a single time-correlated time series ob- 
tained by the Kepler spacecraft . 
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Fig. 4. — Results of the blind-source separation. 
The blue circles present the the first lightcurve 
of the raw data X, the red crosses the retrieved 
signal component, Xa, and the black squares the 
systematic noise component Xgn- 
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Fig. 5. — Results of the blind-source separation. 
The top three signals in red were identified by 
the algorithm to comprise the systematic noise 
model, Ssn- The 4th signal was correctly iden- 
tified to be Gaussian noise and the bottom to be 
the lightcurve signal. Note that the blind-source- 
separation does not preserve signs nor scaling of 
the estimated signals. The scaling of the ordinate 
is identical for all subplots. 
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Fig. 6. — Hinton diagram of the EFICA and WA- 
SOBI interference-over-signal matrices for Exam- 
ple 1. The polygon areas are normalised to the 
highest value in the matrix (given in the bottom 
corners). The smaller the off-diagonal elements of 
the matrix, the higher the signal separation effi- 
ciency of the algorithm. In this case we can see 
the EFICA algorithm to perform better than the 
WASOBI one. 
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Fig. 7. — showing the raw lightcurve (first row in 
figure[3J blue) normalised to unity, with the model 
fit (red) overlaid and the fitting residuals plotted 
underneath (black). 
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Fig. 8. — showing the auto-correlation function 
for 250 lags (red). The 3a confidence limits that 
the observed residual is normally distributed are 
shown in blue. All but two lags are within the con- 
fidence limits, strongly suggesting that the resid- 
ual is dominated by white noise and correlations 
were efficiently removed. 
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Fig. 9. — showing the mean interference over sig- 
nal ratios (ISRs) for both the EFICA (red circles) 
and WASOBI (blue crosses) algorithms for Ex- 
ample 1. In this example, the EFICA algorithm 
clearly outperforms WASOBI by reaching lower 
ISR values. Both algorithms are stable even un- 
der low signal to noise conditions. 



5.1. HST/NICMOS: HD 189733b 

First presented by ISwain et al. this 



data set of the primary eclipse of HD 189733b was 
recorded using HST/NICMOS in the G206 grism 
setting spanning five consecutive orbits. The 
HST-pipeline calibrated data were downloaded 
from the MASlfl] archive and the spectrum was 
extracted using both standard IRAIo routines as 
well as a custom built routine for optimal spec- 
tral extraction. Both extractions are in good ac- 
cord with each other but the custom built routine 
was found to yield a better signal to noise and 
was subsequently used for all further analysis. A 
binning of 10 spectral channels (~ 0.08/im) was 
used resulting in 10 light curves across the G206 
grism band. Figure [10] shows the obtained time se- 
ries which serve as input to the algorithm. It can 
be seen that each time series is strongly affected 
by instrument systematics propagating from the 
blue side of the spectrum (bottom light curve) 
to the red w i th var ying intensity and even sign. 
Swain et al. I (j2008l ) showed that these systemat- 



ics are correlated to instrument state vectors such 
as orbital phase, relative positions and angles of 
the spectrum on the detector, instrument tem- 
perature, etc. We can hence expect that these 
systematics are statistically independent from the 
recorded astrophysical signal (the light curve) and 
it should therefore be in principle possible to de- 
correlate the signal. 

We here demonstrate the de-trending on an in- 
dividual light curve at ^1.694fim (8*^ one down in 
figure [T3)) . All time series in figure [13] were taken 
as input to the algorithm described above to esti- 
mate the de-mixing matrix W, the astrophysical 
signal vectors, Sa and the systematic noise vec- 
tors, Ssn- The interference over signal (ISR) ma- 
trix indicated the good separation of four main 
components figure [TT] with the rest of the com- 
ponents being classified as predominantly Gaus- 
sian or weakly systematic. The existence of more 
than one Gaussian component {Iwn > 1) indicates 
that the set is overcomplete. However since the 
data-set is small enough, no PCA dimensionality 
reduction was performed. After the algorithm has 
identified the correct astrophysical signal, it pro- 
ceeded to reconstruct the light curve using both 



^http:/ /archive. stsci.edu/ 
^http:/ /iraf.noao.edu/ 
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methods described above. 

Method 1: The astrophysical signal was filtered 
using equations [15] & [TH Figure [12] shows the 
raw light curve (blue circles) with the de-trended 
time series, Xa underneath (green squares). Su- 
perimposed ligh t curv es were computed using 



Mandel fc Agol 



were taken from 



20021) with orbital parameters 



Winn et al.1 (l2007l) and limb- 
darkening parameters from Claret ( 2000[ ). It is 
clear that the de-trended light curve is an improve- 
ment to the raw time series but that systematics 
still remain in the data. This is further illustrated 
by plotting the autocorrelation function of the 
model- fit residual in figure [16] (red circles). Here, 
residual correlation can be observed in particular 
at low lags. This is a consequence of the astro- 
physical signal, Sa, being well separated but as 
shown in figure ITT] (component 1), there remains 
some weak interference between the Sa and other 
vectors, which is a consequence of equation [T^] and 
to be expected for real data-sets. 

Method 2: The second method is a less direct 
approach. Instead of filtering for the astrophysical 
signal directly, we try to construct a 'systematic 
noise model' that is then subtracted off the raw 
data. Us ing equation[T8]and a si mplex downhill al- 



gorithm (jNedler fc Mead II1965I ) we estimated the 



scaling matrix, O, by fitting the the systematic 
noise vectors, Ssn to the four out of transit orbits. 
The scaled systematic noise vectors are shown in 
figure [13] which combine to form the systematic 
noise model, Mgn, in figure [131 It should be noted 
that O is only a scaling matrix of the individual 
vectors as the scaling information is not preserved 
by the independent component analysis. Hence, 
relative intra and inter-orbit variations are pre- 
served. Figure [15] shows the corrected data by 
subtracting the systematic noise model off the raw 
data. Inspecting the fitting residual's autocorrela- 
tion function in figure [T^] (black circles) indicates 
the residual to be statistically white and a maxi- 
mal de-correlation of the data has been achieved. 

5.2. HST/NICMOS: XOl-b 



Originally presented bv iTinetti et al. ( 2010f ). 



the primary eclipse of XOlb was observed using 
the HST/NICMOS instrument in the G141 grism 
setting. The HST-pipeline calibrated data was 
downloaded and the spectra extracted using the 
same settings as for section 15.11 This yielded 10 
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Fig. 10. — showing 'raw', extracted 

HST/NICMOS light-curves of HD 189733b 
primary eclipse. Light curves are offset for clarity. 
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Fig. 11. — the Interference over Signal (ISR) ma- 
trix of the component separation for both the 
EFICA and the WASOBI algorithms. All val- 
ues were normalised with the maximum ISR = 
0.0626. Components 1, 3, 5 fc 8 yielding the low- 
est ISR values and correspond to the astrophys- 
ical light curve signal (comp. 1) and the three 
most prominent systematic noise vectors in fig- 
ure [14] Other components were identified as pre- 
dominantly Gaussian or weakly systematic by the 
pipeline. 
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Fig. 12. — showing the raw-data hght curve 
(blue crosses) and the corrected hght curve (green 
squares) offset below. In this example, we used 
equations [15] & [16] as light curve filter. The sys- 
tematic noise components were reduced but resid- 
ual systematics remain in the final light curve. 
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Fig. 14. — Individual systematic noise vectors, 
Ssn, of HD 189733b, with the appropriate scaling. 
Combined they form the systematic noise model 
in figure [T3] (red circles) . 




Fig. 13. — showing the same 'raw' light curve as 
in Figure [12] (blue crosses) and the calculated sys- 
tematic noise model (red circles) offset below. 



Fig. 15. — showing the de-trended data by sub- 
tracting the noise model of the raw data. 
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light curves and which serve as input to the algo- 
rithm, see figure [171 Similar to the HD 189733b 
the algorithm retrieved four main components, the 
light curve signal and three main systematic noise 
components. The ISR matrix can is shown in fig- 
ure[T8l We now proceeded to de-trending the light 
curve at the very red end of the spectrum (first 
from top in figure [TT]) as it, after visual inspec- 
tion, exhibits the most prominent systematics of 
the 10 time series. Light curve fits a ssumed limb 
darken ing and orbital parameters bv lBurke et al. 



Method 1: Figure [T9l shows the raw time series 
and the de-trended light curve using equation 1161 
The light curve is significantly de-trended but sys- 
tematics remain in the data as also shown by the 
autocorrelation function (red circles) in figure 1231 

Method 2: As described in previous sections, 
figure ED shows the retrieved systematic noise vec- 
tors and figure [21] features the 'raw' data with 
the combined systematic noise model (red) under- 
neath. The autocorrelation function of the model 
fit residual is shown in figure [23| (black crosses) 
and shows a factor 2 improvement on the de- 
correlation in the lower lags. 

Figure [20l compares the de-trended light curves 
of method 1 and method 2 and shows the residual 
of method 1 - method 2 (black crosses). There is 
little difference between both methods indicating 
that the signal separation for this data-set is close 
to its maximum with the data being partially de- 
correlated. This is in contrast to the HD 189733b 
example where a near perfect de-correlation was 
achieved and can be attributed to the systemat- 
ics being mostly wavelength invariant in the case 
of XOlb. In other words, systematic noise compo- 
nents which have a constant weighting throughout 
the data set cannot be de-correlated using ICA or 
PCA methods, which is to be expected following 
equations [D - [3l 

5.3. Kepler: Star 10118816 

In the previous examples we have shown that 
spectroscopic datasets can be de-correlated effec- 
tively. We test here how well the proposed al- 
gorithm can de-correlate consecutively observed 
data-sets. This is of particular interest in cases 
where no multi-channel data are available and 
the time series data are contaminated with time- 
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Fig. 16. — showing the autocorrelation function 
for 100 lags of the fitting residual in figure [12| 
(red squares) and figure [TSj (black circles). The 
blue lines signify Scr limits for a Gaussian distri- 
bution. The fitting residual of figure [12] shows 
high amounts of residual correlation, particularly 
at lower lags whilst the fitting residual of figure [15] 
follows a Gaussian distribution. 
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Fig. 17. — showing 'raw', extracted 

HST/NICMOS light-curves of HD 189733b 
primary eclipse. Light curves are offset for clarity. 
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Fig. 18. — same than for figure [TT] The light curve 
vector (component 1) shows residual interference 
with other vectors for both EFICA and WASOBI 
algorithms. Overall the EFICA algorithm outper- 
forms WASOBI. 
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Fig. 20. — showing the de-trended data using 
method 1 (top blue circles) and method 2 (bottom 
green squares) offset from each other. Both results 
show little differences between them as seen by the 
residual of method 1 - method 2 (black crosses). 
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Fig. 19. — showing the raw-data light curve 
(blue crosses) and the corrected light curve (green 
squares) offset below. In this example, we used 
equations [12] & [M] as light curve filter. The sys- 
tematic noise components were reduced but resid- 
ual systematics remain in the final light curve. 
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Fig. 21. — Individual systematic noise vectors, 
Ssn, of XOlb, with the appropriate scaling. Com- 
bined they form the systematic noise model in fig- 
ure [T3] (red circles). 
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Fig. 22. — showing the same 'raw' hght curve as in 
FigurefTOlfblue crosses) and the calculated system- 
atic noise model using the systematic noise vectors 
in figure EH 
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Fig. 23. — showing the autocorrelation function 
for 100 lags of the fitting residual for method 1 (red 
squares) and method 2 (black circles). The blue 
lines signify 3a limits for a Gaussian distribution. 
The fitting residual of method 1 shows residual 
correlation, particularly at lower lags whilst the 
fitting residual of method 2 is by a factor of two 
better de-correlated in the lower lags. 



correlated noise, be it stellar or instrumental. As 
opposed to the previous examples, where several 
time series, Xk, were observed simultaneously, here 
we take a single time series covering several con- 
secutive eclipse features and cut the time series 
into segments spanning equal lengths over each 
eclipse event. Using these segments as inputs to 
the algorithm clearly violates the underlying as- 
sumptions of the independent component analy- 
sis, as the mixing is not instantaneous. In this 
case, the ICA analysis can be understood as a 
Pro j ection Pursuit (PP) analysis, see sectionl2.2.4 



and dHvvarinen fc Oia II2OO0I : IStone ibooilHuber 



Here the ICA algorithm, in the absence of 
a working ICA data-model, will try to extract as 
many non-Gaussian components as possible and 
return the rest of the data in its original form. 
This is very similar to Projection Pursuit, where 
the data is not described by an underlying data 
model at all but only the most non-Gaussian com- 
ponent is retrieved. In other words, we can only 
expect to retrieve the eclipse signal component, 
Sa, with any degree of accuracy. As a result we 
will not be able to retrieve systematic noise com- 
ponents, Ssn, and we can only use Method 1 (in 
section [331) to de-trend the data. 

We have downloaded data observed b y the 
Kepl e r space telescope ( Borucki et al. I Il996 , 



2010; 'Jenkins et ahj 



Caldwell et al. 



201c ; 



Koch ct al. 20icffor a planet-hosting candidate 
star observed over the second and third data- 
release quarters (Q2 & Q3). The time series, with 
the Kepler ID: 10118816, exhibits highly variable 
features and significant time-correlated noise (see 
figure [24l blue crosses). Given Kepler's superb 
instrument calibration, we can assume this time- 
correlated noise to be due to stellar variability. 
Using the periodogram calculator on the NSteD 
database!!, we identified four main periodically 
recurring signals in the data-set. Choosing the 
second strongest feature with a period of 0.040915 
days, we phase folded the data and cut the time 
series in 10 equally sized segments. As for the 
previous examples we now took these time series 
segments as input to the algorithm. 

We performed our de-correlation as for the pre- 
vious examples but using Method 1 only. Figure!^ 
shows the ISR matrix of the separation indicating 



' ]http : I /nsted.ipac.caltech.edu/applications/ETSS/KeplerSndex.html\ 
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a relatively poor separation of the components but 
two (components 4 & 9). As discussed above, this 
behaviour is to be expected with the breaking of 
the instantaneous mixing modeL Nonetheless, we 
obtained a clear feature (component 4) in our anal- 
ysis which is over plotted (red circles) on the mean, 
phase- folded data (blue crosses) in figure [26l Here 
the de-correlated signal has a much reduced scat- 
ter compared to the mean of the phase-folded fea- 
ture, which indicates that much of the unwanted 
stellar variability has been removed. It is also 
clear from this figure that we are not dealing with 
an exoplanetary light curve but a stellar pulsa- 
tion feature. As expected, the remaining compo- 
nents returned from the algorithm (figured?]) are 
the residuals of the input data minus the compo- 
nent shown in figure 1261 Hence we only used the 
component in figure [26l to re-construct the origi- 
nal time series. This was done by using equation 
[TBI on each segment of the time series, followed by 
adding the segments back together in the order 
they were originally split up. 

Figure [24] shows the original input data (blue 
crosses) with the filtered signal (red circles) over 
plotted on top. In the bottom plot (a zoomed 
in version of the time series), it is clear that the 
desired feature remains in the filtered time series 
whilst the contribution of other time-correlated 
stellar noise is substantially reduced. 

6. Discussion 

In the previous sections we have shown that 
for a set of simultaneously observed time series 
data (e.g. following an exoplanetary eclipse with 
a spectrograph) we can describe the data by an 
instantaneous mixing model (equation [3]). This 
allows the separation of non-Gaussian, time and 
spatially-correlated signals from one another. The 
degeneracy caused by not being able to retrieve 
the component's signs or amplitudes can be cir- 
cumvented in two ways: Method 1) The separated 
signals are used to construct a linear transforma- 
tion to filter the astrophysical signal from the orig- 
inally observed data and hence preserve all scaling 
information; Method 2) The separated astrophys- 
ical signal is not used directly but instead all sys- 
tematic noise components are combined to form a 
'systematic noise model' which can then be used 
to correct the original observed data. 
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Fig. 24. — Input time series (blue crosses) with 
filtered signal using Method 1 over plotted (red 
circles). Bottom plot is a zoomed in part of the 
time series above. The algorithm effectively filtered 
for the desired feature and strongly decreased con- 
tributions from autocorrelated noise. 
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Fig. 25. — the Interference over Signal (ISR) ma- 
trix of the component separation for both the 
EFICA and the WASOBI algorithms. All val- 
ues were normalised with the maximum ISR — 
0.09293. Components 4 and 9 are the best sepa- 
rated, with component 4 being the desired signal 
component. 
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Fig. 26. — showing the mean, phase-folded fea- 
ture (blue crosses) with the ICA filtered signal 
component (red circles) over plotted. The ICA fil- 
tered signal shows a significant reduction in scatter 
and auto-correlative noise compared to the simply 
phase folded data. 
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Fig. 27. — Addition components to the signal in 
figure [26] as calculated by the algorithm. 



We have explored the efficiency of the signal de- 
trending on two simulated and two HST/NICMOS 
data sets with different types of systematic noise 
due to different grisms. The simulations demon- 
strate the two methods of de-trending the data in 
an idealised case and explore the efficiency of the 
signal separation in the presence of varying Gaus- 
sian noise in the data. In the instantaneous mixing 
model employed here, Gaussian noise sources are 
only indirectly allowed and can interfere with the 
effectiveness of separating non-Gaussian vectors. 
We tested this point by adding additional Gaus- 
sian noise components of variable amplitude to the 
simulations but did not observe any significant re- 
ductions in the signal separation efficiency. 

We proceeded to analyse two HST/NICMOS 
data sets: the primary eclipses of IID189733b and 
XOlb. For both data sets we find the Method 2 to 
yield better results. In the case of HD189733b, 
we can achieve a near perfect de-correlation of 
astrophysical signal and systematic noise and no 
further steps are necessary to the de-correlation 
process. A more in depth discussion of this data 
set and HST/NICMOS systematics is beyond the 
scope of this publication. In the case of XOlb the 
de-correlation is significant but incomplete. The 
difference in maximum de-correlation achievable 
can be attributed to the systematic noise sources 
being strong functions of wavelength in the case of 
HD189733b whilst almost with constant weighting 
{uki in equation [5]) in the case of XOlb. 

Whenever systematics have constant weighting 
per channel observed (xk) and/or time, it becomes 
very difficult for PCA or ICA based approaches to 
de-correlate the signal from the systematics. Here 
auxiliary information of the instrument is needed 
to proceed with the de-correlation process. This is 
very well possible in the case of dedicated instru- 
ments such as Kepler, as these have been specif- 
ically designed with such hi gh precision and sta- 
bility measurem ents in mind teorucki et al. |[l996t 
Ijenkins et al. 1120^10.) . For instruments that do not 
feature the calibration plan required to further de- 
correlate with instrument state parameters, the 
solution is far less obvious. 

We furthermore explored the de-correlation of 
eclipse signals observed consecutively rather than 
in parallel. We demonstrated, using Kepler data, 
that despite the formal violation of the 'instan- 
taneous mixing model', the proposed algorithm 
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is able to retrieve the desired signal component 
with good accuracy. Such an application is par- 
ticularly important for treating variability of the 
host-star which can significantly im pair the qual 
ity o f the final science re s ult fee. ICzesla et al. 



Ballerini et al. II2011I ) 



2009: Boisse et al. 2011: Aigrain et al. 



2011 



It is furthermore interesting to note that 
pre and post-processi ng steps (e.g. wavelets 



Carter fc Winn 1 120091) . Fourier based techniques 



(jWaldmann et al. 1 1201 II) . d e-correlation u sing in- 
strument state parameters ( Swain et al. IIT0O 8)). 



do not break the instantaneous mixing model and 
can be run in conjunction with ICA methods. This 
makes independent component analysis a very 
powerful and versatile tool for non-parametric de- 
correlation of exoplanetary data sets. 

7. Conclusion 

In the light of searching and characterising 
ever smaller and fainter exoplanetary targets, 
the development of novel de-trending routines be- 
comes increasingly critical. Based on the concepts 
of blind source deconvolution of instantaneously 
mixed signals, we have presented a first step to- 
wards non-parametric corrections and data filters 
that do not require additional information on the 
systematic noise of the instrument or stellar ac- 
tivity. Such algorithms have two important appli- 
cations: 

1) For instruments that lack a calibration plan 
at the accuracy of 10^* in flux variation, which 
is required for spectroscopy of exoplanetary atmo- 
spheres, the spectroscopic signatures become in- 
herently entangled and dependent on the method 
used to correct instrument and other systematics 
in the data. The de-correlation of spectroscopic 
data was demonstrated using two HST/NICMOS 
data sets. 

2) Detections of faint exoplanetary eclipses are 
often made difficult by time-correlated activity of 
the host star. We demonstrated, using a single 
Kepler time series, that much of the stellar vari- 
ability can be removed in time series that span 
several exoplanetary eclipse events. 

The algorithm proposed is a powerful tool for 
lightcurve de-trending, which can be used by its 
own or in conjunction with any other type of 
data filtering or cleaning technique. This becomes 



an invaluable advantage for data analysis when 
the instrument's response function is unknown or 
poorly characterised. 
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supported by an STFC Studentship. 
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This appendix provides some additional notes to the methods employed in this paper. For a more in-depth 
discussion of the topics presented here, please refer to the cited publications. 



A. Uncorrelatedness, orthogonality and independence of Gaussian and non-Gaussian signals 

In Gaussian statistics, our probability densities are fully defined by the first and second statistical mo- 
ments, i.e. their means and covariances. Two random vectors, si and si+i, are said to be uncorrelated when 
their covariance (Csj^si, i) is zero: 



Cs„s,^, = E[{si ~ E[si]){si+i ~ E[si+i])] 
= E[si,si+i]-E[si]E[si+i]=0 

where E[si] is the expectation value of si which can be approximated by the mean in this case by 



(Al) 



1 



M 



t=i 



(A2) 



with M being the number of data points in the time series. 

Furthermore, we define two random variables (s/ and s;+i) to be orthogonal, when both their expectation 
values, in addition to their covariance are zero: 







(A3) 



We can always find an affine, linear transformation from a correlated set of variables to an orthogonal 
one. 

Finally, our two random vectors s; and s;+i are independent from one another if and only if the joined 
probability distribution P(s;,s;+i) of both signals are factorizable into the product of their marginal pdfs, 
Pisi) and Pisi+i): 



P(sz,s/+i) = P(s/)P(sz+i) 



(A4) 



and satisfy the property 



E[g{s,)h{s,+i)] = E[g{si)]E[h{si, 



(A5) 



where g{si) and /i(s/+i) are absolutely integrable functions of s; and s/+i respectively. From the defini- 
tion of independence in equation IA5[ we obtain the definition of uncorrelatedness (equation IA3|) in the 
special case where both s; and Sir_|_i are linear and are only defined by their covariances (i.e. no higher 
order statistical moments) ( Hyvarinen &: Oia 20001 : Hvvarinen fc Oja. T I2OOII : iRilev et al. Il2002h . In other 
words, uncorrelatedness is a special case of independence. Uncorrelated Gaussian random variables are al- 
ways also independent and the definitions of uncorrelatedness, orthogonality (for zero mean) and statistical 
independence become identical. 



B. Preprocessing 

The covariance matrix of X, Cx, is given by Cx — EDE"^, where E is the matrix of eigenvectors and D 
the diagonal matrix of eigenvalues, D = diag{di,d2, ...^dn)- Using principal component analysis (PCA), we 

— 1/2 

compute E and D and the whitening matrix is hence the inverse square root covariance matrix Cx is 



then given by equation |B2] (jHwarinen fc Oia. IbOOll Ijolliflte II2002I ) 



24 



X = C^^/2(X-X) = AS (Bl) 

C^^/^ = ED-^/^E'^ (B2) 
where W = A^"*" and is the de-mixing matrix of the whitened observed signals X. 

C. Blind source separation 

At the heart of the algorithm lies the blind-source separation routine. To attain the demixing matrix 
W, many different types and varieties o f algorithms are being us ed in the literature. Here we will use the 



'Multi-COMBI' algorithm developed by iTichavskv et al. ( 2006a ) combining a fixed point high-order ICA 



algorithm to separate non-Gaussian sources with a second-order statistics blind-source-separation (BSS) 
algorithm for separating auto-regressive (AR) sources. We will briefly outline these algorithms and explain 
how it is applied to the whitened data X obtained in section [3TTJ 



C.l. Parallel FastICA and EPIC A 

In section [52] we briefly outlined the measures of non-Gaussianity, negentropy (equation [5]), used through- 
out this paper and stated that negentropy can be approximated via the kurtosis of the random vector y or 
via the use of contrast functions, equation [9] fc [TOl In section [2.2.31 we showed stated the iteration scheme 
to obtain a single independent component (IC) at a time. This is called a deflationary algorithm where the 
computed IC is subtracted from the data before the second IC is computed. Such an iteration scheme has 
the property of flnding the ICs in the order of decreasing non-Gaussianity. However, the main drawback 
of a serial computation of ICs is that estimation errors in the first ICs propagate in the extraction of later 
ICs via the orthogonalization step. This effect is cumulative and may signficantly impair weaker ICs. This 
predicament can be circumvented by estimating all ICs in parallel. 

Similar to the single unit iteration, the whitened demixing matrix W is at its most mutually independent 
when the projection Y = W'^X is at its most non-Gaussian. The FastICA fixed point iteration step is then 
given by 



W+ ^ g(WX)X'^ - diag[g'(WX)lN]W (CI) 

where W"^ is the unnormalised next iteration of W, In is an N x 1 vector of I's and g{.) and g'{.) are the 
first and second order derivatives of the nonlinear function G(.): 



gi{y) = tanh(ai?/) 
g2{y) = yexp{-y'^/2) 

g3{y) = y^ 

This is followed by a symmetric orthogonalisation step: 

w ^ (w+w+'^)-^/2w+ 



(C2) 



(C3) 



Equations ICll & IC3I are iterated until the result has converged. 

For a full derivation we refer you to [Hvvarinen I (|l999f ) and iHvvarinen fc Oia. I (|200l[ ). Whereas the 
convergence of the FastI CA algorithm is often dependent on the non-linearity chosen by the user, the EFICA 
( Koldovskv et al. 2006[) algorithm employed here is a variant of the above iteration scheme and allows for 
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different non-linearities to be assigned adaptively to different sources. iKoldovskv et al. I (|2006l ) showed that 
EFICA is asymptotically efficient, ie. reaches the Cramer-Rao Lower Bound (CRLB) in an ideal case where 
the nonlinearity G(.) equals the score function. 

To assert a good degree of separation, we can define G as the gain matrix. For a perfectly estimated 
de- mixing matrix, W, the gain matrix is equal to its identity matrix 



G = WA = I 



(C4) 



In signal processing, the performance of blind-source separation algorithms is usually measured by the 
interference over signal ratio matrix, ISR 



q2 

ISRm = k,l^ 1,2,..., d 



G 



(C5) 



kk 



where k and / denote the observed and estimated sources. The ISR for and individual observed signal k is 
given by 



isrfe 



Ed f~,2 
l=ld=ik ^kl 



^Ik 



k = 1,2, 



(C6) 



However, the original mixing matrix. A, is not generally known for real data sets and equations IC5I fc [C6l 
are only useful in the case of simulations. iTichavskv et al. I (|2Q06al ) have shown that the whole ISR matrix 
for the EFICA algorithm can be approximated by 



ISRf,^ 



Ikili+rf) 



(C7) 



Ik = h - pi (C8) 
^fc = E{sugk{su)\ 
n = \pk ~ Pk\ 
Pk = E[g'k{sk] 

where and s; are the k'th and I'th observed and estimated signals of S in equation[31 gk{-) and (?^(.) the 
first and second derivative of G{.) for signal k and N is the number of signals estimated. Here it should 
be mentioned that, of course, the true realisation of each ISR component is unknown and a mean-ISR is 
computed leading to the best 'on average' separation of the signals. 

C.2. WASOBI 

Whilst EFICA is optimised for the separation of instantaneously mixed, non-Gaussian sources, second- 
order statistics BSS algorithms rely on time-structure in the sources' correlation function to estimate W. 
A variety of algorithms e xist in the literature , here we use a derivative of t he popular SOBI algorithm 
(|Belouchrani et al. |[l997h . WASOBI (jYeredor I [2OO0I: ITichavskv et al. Il2006al) to separate Gaussian auto- 
regressive (AR) sources in the input data X. Here, the blind source separation follows the same linear 
model as in equation [3] and the mixing matrix A is estimated by a joint diagonalisation of the signals' 
autocorrelation matrices. The unknown correlation matrices of the observed signals for a given lag r, Rx[t] 



R.T 



1 ^ 

^ ]^E^Nx^[n-f r], T = 0,...,M-1 



(C9) 



n=l 
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satisfies the relation 



R,[r] - ARsHA^, Vr (CIO) 



where Rs[r] = £'[s[n]s-^[n + t]] are the source signals' diagonalised correlation matrices ( Yeredor! bOOOf ) . 



Hence, if the correlation matrices are diagonal, ie. the off-diagonal components are zero, the separated 
signals can be said to be independent from each other. The SOBI & WASOBI algorithms estimate A as the 
joint diagonoliscr of a set of correlation matrices. Similar to the EFICA code, we can define an asymptotic 
estimate of the ISR matrix 

■l^Rfe; - T7-; 1—7 2p rni i*^^^ 



= — y2 o-iiajiRk[i - j] (C12) 



2 / . 

^ iJ—O 



where k and / denote the observed and the estimated sources, {i?A;[r]}^Q"^ is the covariance sequence of the 
fc-th so urce, al is the variance of the source and {aii}ffj^^ are the auto-regression coefficients of the Z-th 



source (ITichavskv et al. Il2006a|) . 



C.3. Multi-COMBI 

The algorithms introduced above are highly complementary to each other. Whilst EFICA has an asymp- 
totically efhcient performance in separating non-Gaussian instantaneous mixtures, WASOBI is asymptoti- 
cally efficient in separating Gaussian time-correlated signals. Both these properties are necessary since a real 
data set will have both of the aforementioned properties and its component s would hence not be op timally 
de-mixed if one would only employ one type of algorithm. MULTI-COMBI (|Tichavskv et al. Il2006al) uses a 



clustering technique in which both algorithms are run on the set of unseparated sources X and their inter- 
ference over signal matrices, ISR^^ and ISR^''^, are estimated. The signals are then clustered depending 
on whether their specific ISRki is lower for the EFICA or WASOBI case. Then, the process is repeated 
until all clusters are singeltons, ie. only contain one signal per cluster, and the signals are hence optimally 
separated. 



D. Convergence check 

From the MULTI-COMBI algorithm, we obtain the estimated signal matrix S, an overall ISR matrix as 
well as final ISR"" and ISR"""". Since the algorithms used here use fixed-point convergence techniques, 
the problem of non-repeatability of the separation process is less than for neural network based approaches. 
However, it is common sense to check the stability of the result obtained and to estimate the error on S. 

In order to estimate the stability of the convergence, we perturb the unknown mixing matrix A with 
a random and known mixing matrix P to give a new mixing matrix A2 = PA and equation [3] becomes: 
X = PAS = A2S. This is equivalent to multiplying the whitened signal X with P 

X2 = PX = PC,^^/2(X-X) = A2S (Dl) 

We re- run the separation step and estimate A2 . Since P is known, we can reconstruct the original mixing- 
matrix and compare it with the new result. In the scope of an automated algorithm, the sum of all terms of 
ISRa is compared to the sum of ISRa2 and the result is reported. 

To identify the stochastic nature of the retrieval we furthermore re-run the separation step with the same 
whitened signal, X, akin to a Monte Carlo simulation. We perform i realisations (where i = 10—100 typically) 
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and use the de-mixing matrices Wi to construct mean noise models later on. This way, we propagate the 
signal separation error to the model-fitting in a coherent manner. 



E. Signal separation 

In order to identify the no n-white (i.e. sy stematic) signals in our estimated signal matrix S, we use the 



Ljung-Box portmanteau test (jBrockwell fc D avis 2006). The test statistic, usually denoted by Q, is defined 



by summing the normalised autocorrelations of the individual time series, §1 over a range of lags: 



m ^2 

Q = n{n + 2) y (El) 

r = l 

where is the autocorrelation at lag t and m is the number of observations in the time series. The 
hypothesis of the time series being solely white noise is rejected if Q is bigger than a pre-specified fraction 
of the chi-squared distribution 

Q > Xl-a,h (E2) 



wher e Xi-a h ^^'^ a-quantile of the chi-squared distribution with h degrees of freedom ( Brockwell fc Davis I 



20061 ). Here we take a = 0.05. 
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